Goto

Collaborating Authors

 Skagerrak


More Victories, Less Cooperation: Assessing Cicero's Diplomacy Play

arXiv.org Artificial Intelligence

The boardgame Diplomacy is a challenging setting for communicative and cooperative artificial intelligence. The most prominent communicative Diplomacy AI, Cicero, has excellent strategic abilities, exceeding human players. However, the best Diplomacy players master communication, not just tactics, which is why the game has received attention as an AI challenge. This work seeks to understand the degree to which Cicero succeeds at communication. First, we annotate in-game communication with abstract meaning representation to separate in-game tactics from general language. Second, we run two dozen games with humans and Cicero, totaling over 200 human-player hours of competition. While AI can consistently outplay human players, AI-Human communication is still limited because of AI's difficulty with deception and persuasion. This shows that Cicero relies on strategy and has not yet reached the full promise of communicative and cooperative AI.


Welfare Diplomacy: Benchmarking Language Model Cooperation

arXiv.org Artificial Intelligence

The growing capabilities and increasingly widespread deployment of AI systems necessitate robust benchmarks for measuring their cooperative capabilities. Unfortunately, most multi-agent benchmarks are either zero-sum or purely cooperative, providing limited opportunities for such measurements. We introduce a general-sum variant of the zero-sum board game Diplomacy -- called Welfare Diplomacy -- in which players must balance investing in military conquest and domestic welfare. We argue that Welfare Diplomacy facilitates both a clearer assessment of and stronger training incentives for cooperative capabilities. Our contributions are: (1) proposing the Welfare Diplomacy rules and implementing them via an open-source Diplomacy engine; (2) constructing baseline agents using zero-shot prompted language models; and (3) conducting experiments where we find that baselines using state-of-the-art models attain high social welfare but are exploitable. Our work aims to promote societal safety by aiding researchers in developing and assessing multi-agent AI systems. Code to evaluate Welfare Diplomacy and reproduce our experiments is available at https://github.com/mukobi/welfare-diplomacy.


Revealing interactions between HVDC cross-area flows and frequency stability with explainable AI

arXiv.org Artificial Intelligence

The energy transition introduces more volatile energy sources into the power grids. In this context, power transfer between different synchronous areas through High Voltage Direct Current (HVDC) links becomes increasingly important. Such links can balance volatile generation by enabling long-distance transport or by leveraging their fast control behavior. Here, we investigate the interaction of power imbalances - represented through the power grid frequency - and power flows on HVDC links between synchronous areas in Europe. We use explainable machine learning to identify key dependencies and disentangle the interaction of critical features. Our results show that market-based HVDC flows introduce deterministic frequency deviations, which however can be mitigated through strict ramping limits. Moreover, varying HVDC operation modes strongly affect the interaction with the grid. In particular, we show that load-frequency control via HVDC links can both have control-like or disturbance-like impacts on frequency stability.


Statistically Guided Divide-and-Conquer for Sparse Factorization of Large Matrix

arXiv.org Machine Learning

The sparse factorization of a large matrix is fundamental in modern statistical learning. In particular, the sparse singular value decomposition and its variants have been utilized in multivariate regression, factor analysis, biclustering, vector time series modeling, among others. The appeal of this factorization is owing to its power in discovering a highly-interpretable latent association network, either between samples and variables or between responses and predictors. However, many existing methods are either ad hoc without a general performance guarantee, or are computationally intensive, rendering them unsuitable for large-scale studies. We formulate the statistical problem as a sparse factor regression and tackle it with a divide-and-conquer approach. In the first stage of division, we consider both sequential and parallel approaches for simplifying the task into a set of co-sparse unit-rank estimation (CURE) problems, and establish the statistical underpinnings of these commonly-adopted and yet poorly understood deflation methods. In the second stage of division, we innovate a contended stagewise learning technique, consisting of a sequence of simple incremental updates, to efficiently trace out the whole solution paths of CURE. Our algorithm has a much lower computational complexity than alternating convex search, and the choice of the step size enables a flexible and principled tradeoff between statistical accuracy and computational efficiency. Our work is among the first to enable stagewise learning for non-convex problems, and the idea can be applicable in many multi-convex problems. Extensive simulation studies and an application in genetics demonstrate the effectiveness and scalability of our approach.


Short-Term Forecasting of CO2 Emission Intensity in Power Grids by Machine Learning

arXiv.org Machine Learning

A machine learning algorithm is developed to forecast the CO2 emission intensities in electrical power grids in the Danish bidding zone DK2, distinguishing between average and marginal emissions. The analysis was done on data set comprised of a large number (473) of explanatory variables such as power production, demand, import, weather conditions etc. collected from selected neighboring zones. The number was reduced to less than 50 using both LASSO (a penalized linear regression analysis) and a forward feature selection algorithm. Three linear regression models that capture different aspects of the data (non-linearities and coupling of variables etc.) were created and combined into a final model using Softmax weighted average. Cross-validation is performed for debiasing and autoregressive moving average model (ARIMA) implemented to correct the residuals, making the final model the variant with exogenous inputs (ARIMAX). The forecasts with the corresponding uncertainties are given for two time horizons, below and above six hours. Marginal emissions came up independent of any conditions in the DK2 zone, suggesting that the marginal generators are located in the neighbouring zones. The developed methodology can be applied to any bidding zone in the European electricity network without requiring detailed knowledge about the zone.


Assessing the performance of statistical classifiers to discriminate fish stocks using Fourier analysis of otolith shape - Canadian Journal of Fisheries and Aquatic Sciences

#artificialintelligence

The assignment of individual fish to its stock of origin is important for reliable stock assessment and fisheries management. Otolith shape is commonly used as the marker of distinct stocks in discrimination studies. Our literature review showed that the application and comparison of alternative statistical classifiers to discriminate fish stocks based on otolith shape is limited. Therefore, we compared the performance of two traditional and four machine learning classifiers based on Fourier analysis of otolith shape using selected stocks of Atlantic cod (Gadus morhua) in the southern Baltic and Atlantic herring (Clupea harengus) in the western Norwegian Sea, Skagerrak and the southern Baltic Sea. Our results showed that the stocks can be successfully discriminated based on their otolith shapes. We observed significant differences in the accuracy obtained by the tested classifiers.